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I ABSTRACT. In 1960s Shiryaev developed Bayesian theory of change detection in independent and 

_ • identically distributed (i.i.d.) sequences. In Shiryaev's classical setting the goal is to minimize 

an average detection delay under the constraint imposed on the average probability of false alarm. 
. Recently, Tartakovsky and Veeravalli (2005) developed a general Bayesian asymptotic change-point 

detection theory (in the classical setting) that is not limited to a restrictive i.i.d. assumption. It 
was proved that Shiryaev's detection procedure is asymptotically optimal under traditional average 
false alarm probability constraint, assuming that this probability is small. In the present paper, we 
consider a less conventional approach where the constraint is imposed on the global, supremum false 
alarm probability. An asymptotically optimal Bayesian change detection procedure is proposed and 
thoroughly evaluated for both i.i.d. and non-i.i.d. models when the global false alarm probability 



c/3 



1. Introduction 



i-C ' approaches zero. 

Keywords and Phrases: Bayesian change-point detection, sequential detection, asymptotic op- 
timality, global false alarm probability, nonlinear renewal theory, non-i.i.d. observations, r-quick 
convergence. 
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The classical change-point detection problem deals with the i.i.d. case where there is a sequence 
of observations Xi, X 2 , . . . that are identically distributed with a probability density function (pdf) 
fo(x) for n < A and with a pdf f\{x) for n ^ A, where A, A = 1, 2, ... is an unknown point of 
change. In other words, the joint pdf of the vector X™ = (X 1; . . . , X n ) conditioned on A = k has 
the form 

k3 ' ( L1 ) A = fc) = < 

X' llli=i/oM, iffc>n. 

b : 

More generally, the observations may be nonidentically distributed or correlated or both, i.e., 
non-i.i.d. In the most general non-i.i.d. case the model can be described as follows 



(1.2) p(X?|A = k) 



rnu/opqxr 1 ), uk>n, 



where /o(Xj|X^ _1 ) and /i(Xj|X^ - ) are conditional densities for Xi given X^p 1 = (Xi, . . . , Xj_i) 
that may depend on i. In addition, the post-change pdf fi(X i \lL\' ) may depend on the point of 
change k. 

A change-point detection procedure r is a stopping time with respect to the sequence of sigma- 
algebras T n = cr(X"), n ^ 1, i.e., {r < n} E T n , n ^ 0. 

Let, for any A = k < oo, P k (E^) be the probability measure (expectation) under which the 
conditional pdf of X n is / (^n|X n_1 ) if n < k — 1 and is /x(X n |X n_1 ) if n ^ k. If A = oo, i.e., 

l 
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when the change does not occur, (E^) is the probability measure (expectation) under which 
the conditional pdf of X n given X n_1 is f (X n \X n ~ 1 ) for every n ^ 1. 

For A = k, a true detection happens when r ^ k and false if r < k. The design of the quickest 
change detection procedures involves optimizing the tradeoff between a "risk" /4(r) related to 
the detection delay (r — k) + and a loss L k {j) due to a false alarm. Possible risk functions are 
i?fc(r) = E fc (r — k\r ^ A;) and -Rfc(r) = ess sup E fc [(r — A;) + |jF fc _ 1 ]. The first one was introduced 
by Pollak [21| and the second one by Lorden [1X71 . The loss L k {r) can be measured by the mean 
time to false alarm Efc[rll{ T <fc}] or by the probability of false alarm (PFA) Pa, (t < k). Note that 
since {r < k} e .Ffc-i, 

(1.3) P k (r < k) = Poo(r < k) and E fc [r]l {T<fe} ] = E^rl^}]. 

Therefore, the requirements of controlling the PFA Pfc(r < k) and the mean time to false alarm 
E fc [r]l{ r<fc }] for all k ^ 1 are equivalent to controlling sup fc P 00 ('r < k) = Poo(t < oo) and E^r, 
respectively. Note that the requirement of having Poo(r < oo) ^ a, a < 1 leads to E^r = oo 
and the requirement E^r = 7, 7 < 00 leads to Poo(r < 00) = 1. 

Under the constraint on the mean time to false alarm E^r ^ 7, 7 > 0, a uniformly optimal 
detection procedure that minimizes the average detection delay E fc (r — k \ r ^ k) or ess sup E fc [(r — 
A;) + |JF fe _ 1 ] for all k ^ 1 does not exist and one has to resort to the minimax setting of minimizing 
sup fe Rk(T~). In the i.i.d. case, Lorden [17] showed that the CUSUM detection test is asymptotically 
optimal with respect to the essential supremum speed of detection measure sup fc ess sup Efe[(r — 
/c) + |jF fe ^ 1 ] for low false alarm rate as 7 — ► 00. Later, Moustakides lH9ll improved this result 
showing that the CUSUM test is actually exactly optimal for all 7 > if the threshold can be 
chosen in such a way that E^r = 7. See also Ritov [22| for an alternative proof of this property. 
More recently, Shiryaev l2"6ll and Beibel [5| proved the same result for the problem of detecting 
a change in the mean value of a continuous-time Brownian motion. Pollak ll2"H introduced the 
randomized at the initial point Shiryaev-Roberts test, which will be referred to as the Shiryaev- 
Roberts-Pollak (SRP) test, and proved that this test is nearly optimal with respect to sup fc Efc(r — 
k\r ^ k) as 7 — > 00. Further, Lai lH6ll and Tartakovsky OH proved that these both detection 
tests are asymptotically (first order) optimal as 7 — » 00 for fairly general non-i.i.d. models. More 
recently, Fuh ©HOI proved asymptotic optimality of the CUSUM and SRP procedures for hidden 
Markov models. 

Specifically, let Z\ denote the log-likelihood ratio between the hypotheses "H k : A = k" and 

Hoo, 

and assume that (n — k)^ x Z^ — > q almost surely (a.s.) as n — > 00 under P k , where q is a positive 
and finite number. Assuming in addition a certain rate of convergence in the above strong law, it 
follows from Ifl6ll3l1 that 

lim . nf inW, 7} su Pfc E fc (r-fe|r^fe) ^ 

7— >oo log 7 

which is attained for CUSUM and SRP tests with the threshold h = log 7. 

Further generalizations to composite hypotheses, nonparametric problems, multipopulation 
problems, multisensor distributed change detection problems, as well as detailed discussions of 
several challenging application areas were presented in Tartakovsky [1291 l32l . Tartakovsky et al 
[34 1, and Tartakovsky and Veeravalli P5l . 
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On the other hand, for the standard CUSUM and SRP tests (with constant thresholds), the 
"global" PFA Poo(r < oo) = 1. To guarantee the condition Poo(r < oo) ^ a for a < 1 
in these latter tests, one may use a curved stopping boundary that increases in time in place of 
the constant threshold. Borovkov [7| proved that the CUSUM and SRP tests with certain curved 
thresholds are asymptotically optimal for i.i.d. data models with respect to the conditional average 
detection delay (ADD) E k (r — k\r ^ k) as k — > oo. It follows from the latter work that when k 
is large, the conditional ADD of these procedures increases as 0(\ogk). This happens because of 
the very strong supremum probability constraint. Therefore, under this constraint neither minimax 
nor uniform solutions are feasible in asymptotic setting when a — » 0, since for any small a there 
exists a large k that cannot be neglected. We argue that under the constraint imposed on the 
global (supremum) PFA the only feasible solution is Bayesian. Indeed, in the Bayesian setting, 
due to averaging the increasing threshold generates a constant term that can be neglected when a 
is small. 

If, however, the false alarm rate is measured in terms of the local PFA sup fc Poo(fc ^ r ^ 
k + T — 1) or by the local conditional PFA sup fc Poo(^ ^ r ^ k + T — 1\t ^ k) in some time- 
window T, which may go to infinity at a certain rate, then the CUSUM and SRP detection tests have 
uniformly asymptotically optimal properties, i.e., minimize the conditional ADD E^fY — k\r ^ k) 
for every k ^ 1 (cf. Lai lH5lll6l and Tartakovsky |32]). 

In Shiryaev's classical Bayesian setting (see Shiryaev [23|-[25| and Peskir and Shiryaev [ 20 1), 
there is a prior distribution = P(A = k), k ^ 0, and the constraint is imposed on the average 
false alarm probability 

oo 

P"(r<A) = ^7r fe P fe (r<fc), 

k=l 

i.e., P 7r (r < A) ^ a, a < 1. The goal is to find an optimal procedure that minimizes the average 
detection delay 

oo 

E^r-X) + = J2^E k (r-k) + 

k=0 

in the totality of procedures {r : P 7r (r < A) ^ a} or an asymptotically optimal procedure 
that minimizes the delay when a — > (see Tartakovsky and Veeravalli [36| and Baron and Tar- 
takovsky [2|). Here P w (E 71 ") is the average probability measure (expectation) defined as P 7r (fi) = 

Shiryaev [25| proved that the stopping time 
(1.5) v B = min {n : P(A ^ n\T n ) > B} 

is optimal in the i.i.d. case and for the geometric prior distribution if the threshold is chosen so that 
P 7r (z/£ < A) = a. Yakir [38] generalized this result for Markov models. Recently, Tartakovsky 
and Veeravalli [36| and Baron and Tartakovsky (2| proved that the Shiryaev stopping time with the 
threshold B a = 1 — a is asymptotically optimal as a — > for a wide class of prior distributions and 
non-i.i.d. models under very general conditions. Moreover, it follows from ll2l l36l that the Shiryaev 
detection test minimizes (asymptotically) not only the average detection delay E 7r (r — A) + but also 
higher positive moments of the detection delay E 7r [(r — A) m |r ^ A], m ^ 1. 
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Note once again that the event {r < k} belongs to the sigma-field Tk-i = l ), which 

implies Pfc(r < k) = Poo(r < k). Therefore, 

oo 

(1.6) P 7T (r<\) = J2^Poo(r<k). 

k=l 

Another possibility is to impose a more strong, supremum constraint 

sup P fe (r < k) = sup Poo(t < k) = Poo(Y < oo) ^ a, 

i.e., to consider the class of stopping times A^a:) = {r : Poo(t < oo) ^ a} for which the 
worst-case (global) false alarm probability sup^ Pfc(r < k) is restricted by the given number 
a < 1. The goal is to find an optimal procedure from the following optimization problem 

inf W(r-X) + ^r opt . 

t6A 00 (o) 

As we already mentioned above, the minimax solution is not feasible under this strong constraint - 
the minimax delay is infinitely large. We believe that the only feasible solution is Bayesian. How- 
ever, see Assaf et al 1 1 1 and Remark 1 in Section |6] regarding a dynamic sampling technique in 
minimax problems. 

In this paper, we are interested in the latter optimization problem. However, it is difficult to 
find an exact solution to this optimization problem even in the i.i.d. case. For this reason, we focus 
on the asymptotic problem, letting a go to zero. Since E 7r (r — A) + /P 7r (r ^ A) and, by d 1 .61) . 
P 7r (r ^ A) ^ 1 — Poo(r < oo), this latter asymptotic problem is equivalent to minimizing the 
average detection delay (ADD) of the form 

inf W(t-\\t^\) asa^O. 

Moreover, we will address the problem of minimizing higher moments of the detection delay 

inf E 7r [(r - A) m |r ^ A], m > 1 as a -> 0. 

TSAoo(a) 

We will write ADD 7r (r) = E 7r (r - A|r ^ A) and D* (r) = E 7r [(r - A) m |r ^ A] for brevity. 

Beibel J6|| considered a purely Bayesian problem for the Brownian motion with the risk func- 
tion cE 7r (r — A) + + Poo(t < oo) when the cost of detection delay c goes to zero and the loss due 
to the false alarm is measured by Poo( r < oo). 

In the present paper, we show that the techniques developed in Il2ll8l ll6ll30ll36ll can be effec- 
tively used for studying asymptotic properties of change-point detection tests in the class A^a) 
when a — > for general stochastic models. 

2. The Detection Procedure 

Let "iffc : A = k" and "ifoo : A = oo" denote the hypotheses that the change occurs at the 
point A = k (k < oo) and does not occur. The likelihood ratio between these hypotheses based on 
the observation vector X" = (Xi, . . . , X n ) is 

fc ._ p(x"iA = fc) fr hmxn 

A "" p(X»|A = oo)-ll /oW X*-i)' 

(see (TOT ). 

We will always use the convention that for n = , i.e., before the observations become avail- 
able, A[] = /i(Xo)//o(^o) = 1 almost everywhere. For the sake of convenience and with very 
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little loss of generality, we will also assume that ir = 0. Since Aq = 1, the likelihood ratios A° n and 
A* are equal, which means that the hypotheses A = and A = 1 are not distinguishable and, there- 
fore, introducing a positive mass at the point A = has little practical meaning. Generalization to 
the case where n > is straightforward. 
Define the statistic 

which is nothing but the average likelihood ratio of the hypotheses H k and H^, and introduce the 
stopping time 

(2.1) t a = mm{n ^ 1 : G n ^ A}, A > 1. 
Note that the statistic G n can be represented in the following form 

n 

(2.2) G n = *ke z * + n n+1 , n > 0, 

k=i 

where n n+ i = P(A ^ n + 1) and = logA^ is the log-likelihood ratio (LLR) between the 
hypotheses H k and given in (I1.4I) . 

It is useful to establish a relationship between the detection procedure ta and Shiryaev's stop- 
ping time vb defined in (11.51) . Making use of the Bayes formula and (12.2b . we obtain 

pa < n\x n ) = ELi ^p(x?|a = k) ELi = g» - n »+i 

which shows that the stopping time can be written as 

t a = min {n ^ 1 : P(A ^ n|X") ^ 1 - n n+1 /A} , A > 1. 

Therefore, while in Shiryaev's test the posterior probability P(A ^ n|X") is compared to a 
constant threshold, in the proposed detection test the threshold is an increasing function in n. This 
is an unavoidable penalty for the very strong supremum PFA constraint. 

3. The Upper Bound on the Global Probability of False Alarm 

Let P(JF n ) denote the restriction of the measure P to the a-algebra T n = cr(Xi, . . . , X n ). The 
following lemma gives a simple upper bound for the PFA P oc (r J 4 < oo) in a general case. This 
conservative bound will be improved in Section l4.2.2l Lemma|3]in the i.i.d. case. 

Lemma 1 . For any A > 1, 

(3.1) Poo(T4 < oo) < 1/A. 

Proof. Noting that 



dP oo(3~ re) 

and using the Wald likelihood ratio identity, we obtain 

P 00 (r A < oo) = E 0O l {TA<0o} = E 7r [G- 1 l {TA<oo} ]. 

By definition of the stopping time r A , the value of G TA ^ A on the set {t^ < oo}, which implies 
inequality (13 .lb . 

□ 
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Therefore, setting A = A a = 1/a guarantees P 00 (r J 4 < oo) < a, i.e., 

A a = 1/a r Aa G A 0Q (a). 

4. Asymptotic Optimality and Asymptotic Performance 

4.1. The asymptotic lower bound for moments of the detection delay. The proof of asymp- 
totic optimality of the detection procedure ta with A = A a = 1/a as a — > is performed in two 
steps. The first step is to obtain an asymptotic lower bound for moments of the detection delay 
D^(t) for any procedure from the class A 00 (a). The second step is to show that the procedure 
ta achieves this lower bound. 

It turns out that the second step is case dependent. For example, proofs and corresponding 
conditions of asymptotic optimality are different in the i.i.d. and non-i.i.d. cases. See Remark[TJin 
Section EOl For this reason, we will consider these two cases separately. However, for deriving the 
lower bound the same techniques can be used in all cases. We start with deriving the lower bound 
in a general, non-i.i.d. case. 

Define L a = loga|, La = q^ 1 log A and, for < e < 1, 

lUr) =P{A<r<A+(l- e)L a } , 

ll A {r A ) = P n {\ < r A < X + (1 - s)L A } , 

where q is a positive finite number. 

The number q plays a key role in the asymptotic theory. In the general case, we do not specify 
any particular model for the observations. As a result, the LLR process has no specific structure. 
We hence have to impose some conditions on the behavior of the LLR process at least for a large n. 
It is natural to assume that there exists a positive finite number q = q(fi, fo) such that n~ 1 Z^ +n _ 1 
converges almost surely to q, i.e., 

(4.1) —Zl + l Ffc a ' s '> q for every k < oo. 

Tl n— >oo 

As we discuss in the end of this section, (14.11) holds in the i.i.d. case with q = I = ~Ei\Z\ whenever 
the Kullback-Leibler information number / is positive and finite. Therefore, in the general case 
the number q plays the role of the Kullback-Leibler number, and it can be treated as the asymptotic 
local divergence of the pre-change and post-change models (hypotheses). Theorem[l]below shows 
that the almost sure convergence condition (14.11) is sufficient (but not necessary) for obtaining lower 
bounds for all positive moments of the detection delay. In fact, the condition (14.21) in Lemma[2]and 
Theorem [l] holds whenever Z^ +n __ 1 /n converges almost surely to the number q. 

The following lemma will be used to derive asymptotic lower bounds for any positive moment 
of the detection delay. 

Lemma 2. Let Z% be defined as in (11.41) and assume that for some q > 

(4.2) Pk < — max Zt n _ x ^ (1 + e)q 1 ► for all e > and k ^ 1. 

I Ad l^n^M J M^oo 

Then, for all < e < 1, 

(4.3) lim sup 7 £ \(r) = 
and 

(4.4) lim 7 ^(T4) = 0. 
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By O), 
(4.5) 



P 7r (r < A) = J2 n kPoo(r <k)<: P^r < oo). 



k=l 



Therefore, Lemma 1 of Tartakovsky and Veeravalli [36| may be applied to prove statements (14.31 ) 
and (14.41) for the classes of prior distributions considered in that work (i.e., for priors with expo- 
nential right tails and for heavy-tailed priors). However, here we do not restrict ourselves to these 
classes of prior distributions. The proof of the lemma for an arbitrary prior distribution is given in 
the Appendix. 

Making use of LemmaEJand Chebyshev's inequality allows us to obtain the asymptotic lower 
bounds for positive moments of the detection delay D^(r), m > 0. 



Theorem 1 . Suppose condition (14.21) holds for some positive finite number q. Then, for all 

m > 0, 

m 



(4.6) 

and 

(4.7) 



D™(ta) > 



log A 



q 



(l + o(l)) as A ->oo 



inf D*(r)> 

reA DO (a) 



log a 



l + o(l)) asa^O, 



where o(l) — > 0. 

PROOF. By the Chebyshev inequality, for any < e < 1, m > 0, and any r G A oc (a) 
W[{t - \) + } m > [(1 - e)L a ] m P* {r - A > (1 - e)L Q } , 

where 

P* {r - A ^ (1 - e)L Q } = P" {r ^ A} - j-Jt). 
By (1431 . for any r G A^ct) 

P^(r ^ A) ^ 1 - Poo(r < oo) ^ 1 - a. 

Thus, for any r G A OC) (a) 



DIM 



(4.8) 



E"[(r- A)+] r 
" P^{r ^ A} 

>[{l-e)L a y 



> [{l-e)L c 



P^{r ^ A} 



1 - 



a 



Since £ can be arbitrarily small and, by LemmaEl sup reAoo( - Q ) 7J >Q! (t) — > as a — ► 0, the asymp- 
totic lower bound (14.71) follows. 

To prove (14.61) . it suffices to repeat the above argument replacing a with 1/ A and using the fact 
that Pqo(ta < oo) ^ l/A by Lemmad □ 

Consider now the traditional i.i.d. model (11.11) with pre-change and post-change densities fo(x) 
and fi(x) (with respect to a sigma-finite measure fJ,(x)), in which case the LLR dl.41) is given by 



(4.9) 



fo(x t y 



k < n. 
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Define the Kullback-Leibler information number 



I = 7(/ l5 /o) = J log ^(X), 

and assume that < 7 < oo. Then ~Ei k Z* +n _ x = I n and the almost sure convergence condition 
(14.11) holds with q = I by the strong law of large numbers, i.e., 



(4.10) ~ z k+n-i Pfc a ' S '> I for every k < oo. 

Note that in the i.i.d. case condition (14.21 ) holds with q = I. Therefore, as the first step we have 
the following corollary that establishes the lower bound in the i.i.d. case. 

COROLLARY 1. Let the Kullback-Leibler information number be positive and finite, < I < 
oo. Then the asymptotic lower bounds (14.61 ) and (14.71 ) hold with q = I. 

4.2. Asymptotic optimality in the i.i.d. case. We now proceed with devising first-order ap- 
proximations to the moments of the detection delay of the detection test ta as A — > oo and estab- 
lishing its first-order asymptotic optimality when A — l/a and a — > in the i.i.d. case. 

4.2. 1 . First-order approximations. In order to prove the asymptotic optimality property, it suf- 
fices to derive an upper bound showing that this bound is asymptotically the same as the lower 
bound specified in Corollary [T] 

It is easily seen that for any k ^ 1 

n 

G n = IT n+1 + Kje zi 

3=1 

(fc-l n-1 
7T fe + U n+1 e- z " + £ 7r ie £?=; AZ * + £ 7T i+1 e- St* ^ 

(4.12) > A fe , 

where AZj = log[/i(Xj)//o(Xj)]. Thus, for any k ^ 1, the stopping time does not exceed the 
stopping time 

(4.13) u k (A) = mm{n>k: Z k n > \og{A/n k )} . 
Moreover, 

(r A -k) + <:v k (A)-k. 

By the i.i.d. property of the data, the random variables AZi, i = 1, 2, . . . are also i.i.d. and hence 
the distribution of v k (A) — k + 1 under is the same as the Pi-distribution of the stopping time 

(4.14) h(A,7r k ) = mm{n > 1 : Z l n > \og(A/>n k )} . 
Therefore, for all k ^ 1 

(4.15) E fc [(r A - fc) + ] m < E 1 (P 1 (A, tt,) - 1)"\ 

which can be used to obtain the desired upper bound. 
Details are given in the following theorem. 
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Theorem 2. Let < / < oo and let prior distribution be such that J2T=i 1 1°§ 7r fc| m ' 7r & < °°- 
( /J As A — > oo, 

'log A" " 



(4.16) D^ir,: , i 

(ii) If A a = 1/a, then TA a G Aoo(a) and, as a — > 0, for all m ^ 1 

(4-17) inf D* (r) ~ D* (tvO ~ f^f^ 

PROOF, (i) In the i.i.d. case, the LLR Z\, n ^ 1 is a random walk with mean EiZ* = In. 
Since / is positive and finite, Ei{— min(0, Z\)} m < oo for all m > 0. Indeed, 

Ei exp {- min(0, ^)} = E ie - Z h {z i <0} + E^i^ ^ E x e^ + 1 = 2. 
Therefore, we can apply Theorem EI. 8.1 of Gut ifTTl that yields, for all m ^ 1, 

(4.18) E 1 [P 1 (A, 7T k )} m = ( h ^ A J n ^ y {1 + o( i)) asA^oo. 



Using (14.181) along with (14.151) implies 

(4.19) E k [(r A - k)+] m < ^^. y [1 + £ (fc,m,A)] 

where e(fc, m, A) — ► as A — > oo. 

Write a = log A. Now, averaging in (14.191) over the prior distribution, we obtain 

oo oo 

X>E fe [(r A - k)+r ^ (y) m {XX 1 + 

(4.20) k=1 ^ fc=1 



+ ^7T fc (l + * l0g7rfc ^ m £(fc,m,A)| asA^oo, 
fc=i a 

Since by the conditions of the theorem J2T=i I l°g 7r fc| m7T fc < °°» it follows that 

OO / I . I \ m 

(4.21) zZ nk ( 1+ ) = 1 + °( 1 ) asA^oo. 

fe=i ^ a / 

The important observation is that since | log7Tfc| — > oo as — > oo, the asymptotic equality (14.181) 
and, hence, the inequality (14.191) also hold for any A > 1 as k — > oo. This means that e(k, m, A) — > 
as k — > oo for any fixed A > 1 and also as A — > oo. It follows that 

oo 

^ 7r fc | log7r fc | m e(/c, m, A) < oo for any A >1 

it=i 

and, hence, 

oo /• . , I \ m 

(4.22) y^7r fc ( 1 + 1 ° g7rfc| J e(k,m,A)^0 as A ^ oo. 



k=l V 

Combining (14.201) . (14.211) . and (14.221) yields the asymptotic inequality 

(l + o(l)) asA^ 



oo. 
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Finally, noting that P 7r (r A ^ A) ^ P 00 (r A = oo) ^ 1 - 1/A (cf. Lemma[j} and 

W[{r A -\) + } m = ^{r A ^\)W m {r A ), 

we obtain the upper bound 

^) (l + o(l)). 

Comparing this asymptotic upper bound with the lower bound (14.61) (see Corollary [l]) completes 
the proof of (14.1 61) . 

(ii) The fact that r Aa € Aoo(a) when A a = l/a follows from Lemma [TJ The asymptotic 
relation (14.171) follows from (14.161) and the lower bound (14.7b . 

□ 

4.2.2. Higher-order approximations. The upper bound P 00 (r y i < oo) ^ 11 A (see (13.11) ) for 
the global PFA, which neglects a threshold overshoot, holds in the most general, non-i.i.d. case. 
In the i.i.d. case, an accurate approximation for P o(' r J 4 < oo) can be obtained by taking into 
account an overshoot using the nonlinear renewal theory argument (see Woodroofe ll37l and Sieg- 
mund [27|). This is important in situations where the upper bound (13.11 ) that ignores the overshoot 
is conservative, which is always the case where the densities fi(x) and fo(x) are not close enough. 

In order to apply relevant results from nonlinear renewal theory, we have to rewrite the stopping 
time r A in the form of a random walk crossing a constant threshold plus a nonlinear term that is 
slowly changing in the sense defined in Il27l l37l. Using (14.1 II) and writing 

(k-l ra-1 
7T k + n n+1 e- AZi + £ Az * + ^le" E ' =fc ^ 

j=l j=k 

we obtain that for every k ^ 1 

(4.24) \ogG n = Z k n + £ k n . 

Therefore, on {t a ^ k} for any k ^ 1, the stopping time t a can be written in the following form 

(4.25) T A = min{n^k: Z k + £ k n ^a}, a = log A, 

where £ k is given by (14.231) and Z k ,n^kisa random walk with mean E k Z k = In. 
For b > 0, define r/ b as 

(4.26) r) b = min{n ^ 1 : Z l n ^ b}, 

and let x fe = Z~L — 6 (on {7^ < oo}) denote the excess (overshoot) of the statistic Z\ over the 
threshold b at time n = r/ b . Let 

(4.27) H(y,I)= \imP 1 {H b ^y} 

b^oo 

be the limiting distribution of the overshoot and let 

POO 

(4.28) COO = lim E ie -" 6 = / e - y dH(y,I). 

b-^oo J 

The important observation is that £\, n ^ 1 are slowly changing. To see this it suffices to note 
that, as n — > oo, the values of converge to the random variable 

(k— 1 oo 
i=i i=fc 



ASYMPTOTIC OPTIMALITY IN BAYESIAN CHANGE-POINT DETECTION PROBLEMS 1 1 

which has finite negative expectation. Indeed, on the one hand £^ ^ log 7r fc , and on the other hand, 
by Jensen's inequality, 



k—l oo 



E fc 4 ^ log I 7T fe + ^E fc e s ^ Az * + ^ n j+1 E k e~^ 
V i=i i=fc 

tfe— 1 oo \ / oo \ 

j=l j=k J \j=l J 



where we used the equalities 



T71 y- k - 1 AZ l TT \r fl{X; 



and 



11 fc /iW 



i=k 

which hold since, obviously, 



and 



E *77^ = I Tr\fo(x)d^x) = 1 forz < k 

E *il = / m Mx)Mx) = 1 fM!>A - 



An important consequence of the slowly changing property is that, under mild conditions, the 
limiting distribution of the overshoot of a random walk does not change by the addition of a slowly 
changing nonlinear term (see Theorem 4.1 of Woodroofe PTlO . This property allows us to derive 
an accurate asymptotic approximation for the probability of false alarm, which is important in 
situations where the value of / is moderate. (For small values of / the overshoot can be neglected.) 
The following lemma presents an exact result. 

Lemma 3. Suppose Z\, n ^ 1 are nonarithmetic with respect to Pi. Let I < oo. Then 

((I) 

(4.29) P 00 (r A <oo) = ^(l + o(l)) as A -> oo. 

PROOF. Obviously, 

Poo(ta < oo) = {G- l A l {TA<oo} } = W{{A/{AG TA )\ TA<oo} } = 

= Ie* K^Woc}} , 

where Xa = logC7 rA — a. Since Xa ^ and P 7r (r 4 < A) ^ Poo (ta < oo) ^ 1/A, it follows that 
E* {e-^l {TA<oo} } = {e-X'\T A < A} P^(r A < A) + {e~^\r A > A} (1 - P 7 ^ < A)) 
= W {e' Xa \r A > A} +0(1/ A) as A -> oo. 
Therefore, it suffices to evaluate the value of 

oo 

E" {e- Xa |r A ^ A} = J^P(A = k\r A ^ k)E k {e~ Xa \r A ^ k) . 

k=l 
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Recall that, by (14.251) . for any 1 < k < oo, 

T A = {n>k:Z k n + t k n >a) on {r A > k}, 

where Z%, n ^ k is a random walk with the expectation E k Z\ = I and £ k , n ^ k are slowly 
changing under P fc . Since, by conditions of the lemma, < I < oo, we can apply Theorem 4.1 of 
Woodroofe ll37l to obtain 

POO 

lim E k {e-*>\r A > k] = dH(y,I) = C(/). 

Since P^ta ^ k) ^ 1 — 1/A and P^(r A ^ A) ^ 1 - 1/A, 

lim P(A = k\r A >k)= lim = n 



and, therefore, 



lim E 71 " |e" Xa |rA ^ A} = lim E w {e~ Xa \ = ((I), 

A— >oo 1 J A^oo L J 



which completes the proof of (14.291) . 

□ 

Under an additional, second moment condition, the nonlinear renewal theorem P7l also allows 
for obtaining a higher-order approximation for the ADD: 

(4.30) E k (r A -k\r A ^k) = r 1 [log A - CZ(I) + x(/)] + o(l), fc > 1; 

oo 

(4.31) ADD ,r (r / i) = E^(r A - A|r A ^ A) = r 1 [log A -J^C^I)^ + + o(l), 

fc=i 

where Q(J) = E^ and 

/"OO 

(4.32) = lim E^ = / y dH(y, I) 

a^oo J Q 

is the limiting average overshoot in the one-sided test. 

However, approximations (14.301) and ( 14.311) have little value, since it is usually impossible to 
compute the constant C k (I). Instead, we propose the following approximations 

(4.33) E k (r A - k\r A > k) « r 1 [log(A/n k ) + x(J) - l] , k ^ 1; 

oo 

(4.34) ADD^ta) « J" 1 [log A + ^ 7r fe | lo g 7r fc | + x(J) - l] , 

k=l 

which use the minimal value of the random variable 0^ = log7r fc . Clearly, one may expect that 
these approximations will overestimate the true values. On the other hand, it is expected that the 
approximations that ignore the overshoot given by 

(4.35) E k (r A -k\r A ^k)^r x [log(A/vr fc ) - l] , k > 1; 

oo 

(4.36) ADD w (r A ) « /^[logA + ^Trfellog^l - l] 

k=i 

will underestimate the true values. 

The constants ((I) and x(J) defined in (14.281) and (14.321) are the subject of the renewal theory. 
They can be computed either exactly or approximately in a variety of particular examples. 
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4.3. Asymptotic optimality in the non-i.i.d. case. In this section, we deal with the general 
non-i.i.d. model (11.21) and show that under certain quite general conditions the detection procedure 
(12.11) is asymptotically optimal for small a. 

As we established in Theorem [l] above, the strong law of large numbers (14.11 ) is sufficient 
for obtaining the lower bound for the moments of the detection delay. However, in general, this 
condition is not sufficient for asymptotic optimality with respect to the moments of the detection 
delay. Therefore, some additional conditions are needed to guarantee asymptotic optimality. 

4.3.1. Weak asymptotic optimality. We begin with answering the question of whether some 
asymptotic optimality result can still be obtained under the almost sure convergence condition 
(I4.ll ). The following theorem establishes asymptotic optimality of the procedure r Aa in a weak 
probabilistic sense. 

Theorem 3. (Weak Asymptotic Optimality) Let there exist a finite positive number q such 
that condition (14.11 ) hold, and let A = A a = 1 / a. Then, for every < e < 1, 

(4.37) inf Pfc {(r - k) + > e(r Aa - k) + \ ► 1 for all k ^ 1 

and 

(4.38) inf P-{(t-A)+>£(t Aq -A)+} — -> 1. 
PROOF. Extracting the term e z ™, the statistic G n can be written as follows: 

fc-1 n-1 
3=1 j=k 

Writing 

fc-1 n-1 
3=1 j=k 

we obtain that for every k ^ 1 

(4.40) log G n = Z k n + log(l + Y n k ) + log ir k . 

It is easily verified that E k e~ z ™ = 1, E k e~ z * = 1 for j ^ k, and E^e^- 1 = 1 for j < k — 1 and, 
hence, 

(fc-1 n-1 \ 

n n+1 + ^ + J2 7r ^ 1 = (1 " 7rfe )/ 7rfc - 
3=1 j=k J 

Since \og(l + Y k ) is non-negative, applying Markov's inequality we obtain that for every e > 
P fc {n- 1 log(l + Y k ) ^ e} < e~ n£ (l + E k Y n k ) = e"" 6 /^. 
It follows that for all e > 



(4.39) G„ ■ k,,c'» ( 1 + — 

TTfc 



^Pfcjn^bgCl+y*) >e}<oo, 



n=k 



which implies that 



(4.41) log(l + Y k ) Pk a ' s -> for every k > 1. 

n — fc + 1 n^oo 
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Using (|Q> . (14.401) . and (14.411) yields 

(4.42) - log G n Pfc ~ a ' s '> g for every k ^ 1. 

J7, n— >oo 

Clearly, — > oo as A — > oo almost surely under P k for every ^ 1 and, by (14.421) . G n — > oo 
a.s. under P fc (as n — > oo), which implies that Pfc(iA < oo) = 1. Therefore, 

(A A ~ Pfc-a.s. logG^-l log A \ogG TA P fc -a.s. 

(4.43) g <- < ► q 

A — >oo Tj± T~a Ta A— >oo 

and, since Pu{ta < k) ^ 1/A — > 0, it follows that 

(V . _ + 1 

(4.44) r ► - in P fc -probability as a -»• for all k ^ 1 

| log a | q 

and 

(4.45) ^ ~ A ) , I i n -probability as a -> 0. 

| log a\ q 

Next, since the right side in inequality d6.ll) (see Appendix) does not depend on the stopping time 
r it follows 

(4.46) lim inf P fe {r - k ^ eq~ l \ logd) = 1 for all k ^ 1 and < e < 1, 

which along with (14.441 ) proves (I4.37I ). 

Finally, the asymptotic relation (14.381 ) follows from (14.451 ) and LemmaEl which implies that 



lim inf P 71 (r - A ^ eq 1 |loga|) = l for all < e < 1. 



□ 



4.3.2. First-order asymptotic optimality. We now proceed with the first-order (FO) asymptotic 
optimality with respect to positive moments of the detection delay D^(t). We first note that using 
the method proposed by Lai [fT6l it can be shown that the ADD of the detection procedure r Aa 
attains the lower bound (14.71 ) (m = 1) under the condition 

max P fe {n _1 Zi, ^ q — e\ — » as n — > oo for all e > 0. 

It can be also shown that, for any m ^ r, the sufficient condition for D^(rA a ) to attain the lower 
bound (14.71) is 

oo oo 

^n^Pk {Zk+n-i ^(q-e)n} <oo for all e > 0. 

fc=l n=l 

This latter condition is closely related to the following condition 

oo 

(4.47) ^7TfeEfc(Tfe i£ ) r < oo faralle>0, 

k=l 

where 

(4.48) T k , E = sup {n > 1 : n^Z^^ - q < -e) (sup{0} = O) 
is the last time when n~ 1 Z fc t +n _ 1 leaves the region [q — e, oo). 
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Theorem 4. (FO Asymptotic Optimality) Let conditions (14.21) and (14.471) hold for some pos- 
itive finite q and some r ^ 1. Assume that 

oo 

(4.49) I log 7r fe I^TTfe < oo form ^ r. 



k=l 



Then for allm ^ r 



m 



(4.50) D^(r A ) ~ ( 1 as A - 0. 



9 

Tjf A = A a — 1/a, then G A^a) and for all m ^ r, 



(4.51) inf D* (r) ~ D* (tuJ ~ ( ) a5 a _> . 

Proof. To prove (14.501) it suffices to show that the lower bound (14.61) in Theorem [T] is also 
asymptotically the upper bound, i.e., 

(1 A \ m 
-M (l + o(l)) asA^oo. 



It follows from equality (14.391 ) that 



G7 n > e z "7r fc , 



and, therefore, for any k ^ 1, 

(4.53) (ta - ^ i/ fc (4) = min {n ^ 1 : Z fe fe +n _ 1 ^ log(A/7r fc )} . 
Thus, 

n7r , x < Er=i^E*(^(^)r 
Dm(TA) ^ — pi^x) — • 

Since by Lemma[l]P(r,4 ^ A) ^ 1 — Poo(ta < oo) ^ 1 — 1/A it is sufficient to prove that 

(4.54) 5> fe E fc (i/fc(A)r < ( — ) + asA^oo. 
fc=i \ Q / 

By the definition of the stopping time z^, 

2 A+« fc -2 < M^M) on {z/ fc < oo}. 
On the other hand, by the definition of the last entry time (14.481) . 

Z k k+Uk _ 2 >{q- e)(u k - 1) on {u k > 1 + T Ks ). 

Hence, 

(q - e)(uk - 1) < log(A/7r fe ) on {T fcj£ + 1 < z/ fc < oo} 

and we obtain 

E fc^r = E fc ^ll{ T +i <I/fc<00 } + E fe z/f 1(^^ +1} < ( + S ^ J *M + E fc (l + T fc , e ) 



g-e 



Averaging over the prior distribution yields 



fc=l fe=l ^ <? e / fc=1 
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By conditions fi~49l) and d4~47l> . J2T= 1 1 logvr fc | m vr fc < oo and ^ = 1 7c k E k {T kt£ ) m < oo for m^r . 
Since e can be arbitrarily small, the asymptotic upper bound (14.541) follows and the proof of (14.501) 
is complete. 

Asymptotic relations ( 14.511 ) follow from (14.501 ) and the asymptotic lower bound (14.71 ) in Theo- 
rem m 

□ 

Introduce now the double- sided last entry time 

(4.55) l£ = sup {n > 1 : \n~ 1 Zj^ 1 - g| > e} (sup {0} = 0), 

which is the last time when n~ 1 Z^ +n _ 1 leaves the region [q — e,q + e]. In terms of T£ s £ , the almost 
sure convergence of ( 14. II ) may be written as P k {T^ s £ < 00} = 1 for all e > and k ^ 1, which 
implies condition (I4.2I) . 

If instead of condition (14.471 ) we impose the condition 

00 

(4.56) 7r fc E fc( I j£) r < 00 for all £ > and some r ^ 1 

fc=i 

that limits the behavior of both tails of the distribution of the LLR Z k +n _ l5 then both conditions 
(14.21) and (14.471) are satisfied and, therefore, the following corollary holds. 

COROLLARY 2. Suppose condition (14.561 ) is satisfied for some positive finite q. Then the as- 
ymptotic relations ( 14.501 ) and ( 14.511) hold. 

Note that the condition E fc (T^) r < 00 is not more than the so-called r-quick convergence 
of to q under Pfc (cf. Lai lH51ll4ll and Tartakovsky [30]). It is closely related to the 

condition 

00 

£" r " lp *{ 

n=l 

which determines the rate of convergence in the strong law of large numbers (cf. Baum and Katz 
|4] in the i.i.d. case). For r = 1, the latter condition is the complete convergence of to 
q under P k (cf. Hsu and Robbins lH2lO . 

In particular examples, instead of checking the original condition (I4.56I) . one may check the 
following condition 

00 00 

n=l k=l 

which is sufficient for the asymptotic optimality property. 

Remark 1. In the i.i.d. case, the finiteness of the (r + l)-st absolute moment of the LLR, 
Ei|Z|| r+1 < 00, is both necessary and sufficient condition for the r-quick convergence (14.561) . 
See, e.g., Baum and Katz [4|. Therefore, Theorem|4|implies asymptotic relations (14.161) and (14.171) 
for m ^ r under the (r + l)-st moment condition. On the other hand, Theorem |2] shows that these 
relations hold for all m > under the unique first moment condition: I < 00. 

Remark 2. The asymptotic approximation (14.501) for the ADD (m = 1) ignores the constant 
CV = YlT=i I log 7Tfc I . The proof suggests that preserving this constant may improve the accuracy 
of the first-order approximation for the ADD, i.e., the following approximate formula 

KDW{r A )^q-\\ogA + C n ) 



17k 



qn 



> en > < 00 for all e > 0, 



J k+n-l 



qn 



> en > < 00 for all e > 0, 
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may be more accurate in particular examples. 

5. Examples 

5.1. Detection of a change in the i.i.d. exponential sequence. Let, conditioned on A = k, 
the observations Xi, . . . , are i.i.d. Exp(l) and X k , X k+ i, ... are i.i.d. Exp(l/(1 + Q)), i.e., 

= TTQ e ~ x/il+QH{x ^ h /o(x) = e ~ xt ^ 

where Q > 0. Then the partial LLR AZ n = - log(l + Q) + [Q/(l + Q)\X n and the Kullback- 
Leibler information number 

/ = log(l + Q)-Q/(l + Q). 

By Theorem El the detection test ta with A a = 1/at minimizes asymptotically as a — > all 
positive moments of the detection delay. 

The distributions of the overshoot K h = — b in the one-sided, open-ended test rjt, are 
exponential for all positive b [33]: 

P x (x 6 >x) = e- x ^l {x>0} , P 00 (x, > x) = e-< 1+ Wl {x>0} 

and, therefore, 

C(Q) = 1/(1 + Q), k{Q) = Q. 

Note that these formulas are exact for any positive 6, not just asymptotically as 6 — ► oo. 
By LemmafJl if the threshold is set as 

A - 1 
a (l + QK 

then for small a 

Poc(T4 Q < oo) = a(l + o(l)), 

and by (H33b . 

ADD "'^» K mTToFqaTTq) (' loga| " log(1 + 0) + + f>' 1oe " 1 " : ) ■ 

If the prior distribution of the point of change is geometric with a parameter p, 

7T fc = p(l - 0<p<l, fc^l, 

then 

n i i 1 ~ P 1 °g( 1 ~ P) 
) n k \ log7r fe = log , 

and, therefore, the approximation to the average detection delay is given by 

ADD ^ J * iog(i + g)-g/(i + g) { |loga| " log(1 + Q) + Q 

+ iogi^- lQg(1 - p) -i|. 
p p > 

Note also that in the case of i.i.d. observations the detection statistic G n obeys the recursion 

G n = (C7 n _! — H n+ i)e AZn + Il n+1 , Gq = 1, 
where LT n+ i = (1 — p) n for the geometric prior distribution. 
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5.2. Detection of a change in the mean of a Gaussian autoregressive process. Let X n = 

91^x^n} + V n , n ^ 1, where 9 ^ is a constant "signal" that appears at an unknown point in time 
A and V n , n ^ 1 is zero-mean stable Gaussian p-ih. order autoregressive process ("noise") AR(p) 
that obeys the recursive relation 

v 

Vn = J2 6 J Vn -J + ^ n ^ 1 ^ V i = ° fOT 3 °' 
3=1 

where £ n , n ^ 1 are i.i.d. jV(0, a 2 ) and 1 — E?=i ^V"* = nas no roots inside the unit circle. 
For i ^ 1, define 

(X 1 ifi = 1 

and for z ^ and fc = 1, 2, . . . , define 

ifi = Jfe 

^(1-Ej=i^) if + 1 • 
0(l-EJ=i*i) if^P + A; 
The conditional pre-change pdf / (Xi|Xj _1 ) is of the form 

f (Xi | Xr 1 ) = \<p (f ) forallz^l, 

and the conditional post-change pdf /i(Xj|Xi ), conditioned on A = k, is given by 

| Xr 1 ) = ±<p forz^A;, 

where (f(y) = (2-k)~ 1 / 2 exp {— y 2 /2} is the standard normal pdf. 
Using these formulas, we easily obtain that the LLR 



^ n 1 n 



a- — 2a 2 

i=fc i=fc 



Write 



9 



9' 2 



2a 



2 




Note that, under P fe , the LLR process Z^ +f: _ 1 , n ^ 1 has independent Gaussian increments IS.Z n . 
Moreover, the increments are i.i.d. for n ^ p + 1 with mean EfcAZ„ = g and variance q/2. Using 
this property, it can be shown that Z^.^Jn converges r-quickly to q for all positive r under 
(see Tartakovsky and Veeravalli 06ll for further details and generalizations). 

Therefore, Theorem |4| and Corollary |2] can be applied to show that the detection test r Aa with 
A a = 1/a asymptotically minimizes all positive moments of the detection delay. 

Note also that in the "stationary" mode when the stopping time r A ^> k, the original problem 
of detecting a change of the intensity 9 in a correlated Gaussian noise is equivalent to detecting 
a change of the intensity 9(1 — Ej=i $j) m white Gaussian noise. This is primarily because the 

original problem allows for whitening without loss of information through the innovations X n , 
n ^ 1 that contain the same information about the hypotheses and as the original sequence 

X n , n > 1. 
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5.3. Detection of additive changes in state-space hidden Markov models. Consider the 
linear state-space hidden Markov model where the unobserved m-dimensional Markov component 
9 n is given by the recursion 

0„ = F0 n _! + W n _ x + i/ 9 l {A<n} , n^O, O = O, 

and the observed r-dimensional component 

X n = 8 n + V n + v x t{\<: n }, n ^ 1. 

Here W n and V n are zero-mean Gaussian i.i.d. vectors having covariance matrices Kyy and Ky, 
respectively; vg = (uq, . . . , v™) and v x = (v]., . . . , 2/£) are vectors of the corresponding change 
intensities; and Fisamxm matrix. 

It can be shown that under the no-change hypothesis the observed sequence X n , n ^ 1 has an 
equivalent representation with respect to the innovative process £ n = X n — E(0J.F n _i), n ^ 1: 

X n = n ^ 1, 

where £ n ~ A/"(0, S n ), n = 1,2, .. . are independent Gaussian vectors and 8 n = E(0„|jF n _ 1 ) 
(cf., e.g., Tartakovsky [28|). Note that 9 n is the optimal (in the mean-square sense) one-step ahead 
predictor, i.e., the estimate of 6 n based on X™ -1 , which can be obtained by the Kalman filter. Under 
the hypothesis "Hk : A = k", 

X n = 5 n (k) + n + £ n , n ^ 1, 

where 5 n (k) depends on n and the change point k. The value of 6 n (k) can be computed using 
relations given, e.g., in Basseville and Nikiforov 0. 
It follows that the LLR Z k n is given by 

n 1 n 

i=k i=k 

where are given by Kalman equations (see, e.g., (3.2.20) in 0). Therefore, the original abrupt 
change detection problem that occurs at A = k is equivalent to detecting a gradual change from 
zero to 5i(k), i ^ k in the sequence of independent Gaussian innovations & with the covariance 
matrices Sj. These innovations can be formed by the Kalman filter. Note also that since the post- 
change distribution depends on the change point k through the value of 5 n (k), there is no efficient 
recursive formula for the statistic G n as in the i.i.d. case. 

As n — > oo, the normalized LLR r7, _1 Z^ +n _ 1 converges almost surely under P k to the positive 
constant 

1 1 fc+n— 1 

q=- lim - ]T 5 i (k) T T,^ 1 5i(k). 

i=k 

Using Q, we obtain that this constant is given by 

q = i {(2l m - F*)-V e + [I r - (^I m - F^^FK]^} , 

where K is the gain in the Kalman filter in the stationary regime, I m is the unit m x m matrix, and 
F* = F(I m -K). 

Moreover, since the process Z\, n _ x , n ^ 1 is Gaussian with independent increments, 
converges strongly completely to q (i.e., r-quickly for all r > 0, see Tartakovsky [30|). Therefore, 
Corollary El shows that the detection test ta b is asymptotically optimal as a — > with respect to 
all positive moments of the detection delay. 
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5.4. Detection of non-additive changes in mixture and HMM models. In the previous two 
examples the changes were additive. Consider now an example with non-additive changes where 
the observations are i.i.d. in the "out-of-control" mode and mixture-type dependent in the "in- 
control" mode. This example was used by Mei |18| as a counterexample to disprove that the 
CUSUM and SRP detection tests are asymptotically optimal in the minimax setting with the lower 
bound on the mean time to false alarm. However, we show below that the proposed Bayesian test 
is asymptotically optimal. This primarily happens because the strong law of large numbers still 
holds for the problem considered, while a stronger essential supremum condition (cf. Lai |16|), 
which is required for obtaining a lower bound for the minimax average detection delay, fails. 

Let gi(X n ), g 2 (X n ), and fi(X n ) be three distinct densities. The problem is to detect the change 
from the mixture density 



/o(X?) =(3l[g 1 (X l ) + (1 - f3) n^(X, 



i=l i=l 



to the density fx, where < (3 < 1 is a mixing probability. Therefore, the observations are 
dependent with the joint pdf / (X") before the change occurs and i.i.d. with the density fx after 
the change occurs. 

Denote R^n) = \og[h{X n ) / gj {X n )} and I, = E^l), j = 1, 2. 

It is easy to show that 

fx(Xi) e R ^((3^x + l-P) 



where & = \~Y m=1 A£ TO , A£ m = g 1 (X m )/g 2 (X m ). Next, note that 



, . 3-3^ l + v£ n 
where v = (3/(1 - (5), so that the LLR 

(5.1) Z k n := Y log h{X{) . . = R 2 (i) + log i±%l. 

Assume that I x > h, in which case the expectation E fc log A£ m < for k < m and, hence, 

n 

= 6-i TT A Cm "~ a ' S ' : for every k < oo. 

n— >oo 

m=k 

The condition (14.21) . which is necessary for the lower bound (14.71) to be satisfied, holds with the 
constant q = I 2 . Indeed, since R 2 (i),i ^ k are i.i.d. random variables under with mean I 2 and 
since £ n — > 0, the LLR obeys the strong law of large numbers: 

-Z^ +fe „ 1 -> I 2 P fe -a.s. as n -> oo, 

which implies (14.21) with q = I 2 and, hence, the lower bound (I4.7L 

(11 I \ m 
1 °; Q| ) (l + o(l)) asa-»0forallm>0. 
h J 
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Next, using (14.401) and (15.11) . we can write the statistic log G n is the following form 



71 



log G n = J2 Mi) + n) + log[7r fc (l + )], 



where 



n) = log 



1 + v£ k -i 



The sequence Y^,n ^ A; is slowly changing by the argument given in the proof of Theorem |3] The 
sequence ip(k, n),n ^ k is also slowly changing. In fact, since £ n — > w.p. 1, it converges to the 
finite random variable log(l + v^k-i)- Therefore, by the nonlinear renewal theorem ll37l . 



Note that the results of Tartakovsky and Veeravalli [36| suggest that the Shiryaev detection 
procedure is also asymptotically optimal under the traditional constraint on the average false alarm 
probability. On the other hand, as we mentioned above, the minimax property of the CUSUM and 
Shiryaev-Roberts tests does not hold in the example considered. 

Finally, we note the above simple mixture model is obviously a degenerate case of a more 
general model governed by a two-state HMM when transition probabilities between states are equal 
to zero and the initial distribution is given by the probability [3. The proposed Bayesian procedure 
(as well as the Shiryaev procedure in the conventional setting) remains asymptotically optimal for 
the model where the pre-change distribution is controlled by a finite-state (non-degenerate) HMM, 
while the post-change model is i.i.d. On the other hand, the condition CI of Fuh [9 1 does not hold 
and, therefore, one may not conclude that the CUSUM test is minimax asymptotically optimal 
under the constraint on the average run length to false alarm. For such a model, the minimax 
asymptotic optimality property of the CUSUM is an open problem. Simulation results show that 
the performance of the CUSUM test is poor at least for the moderate false alarm rate, while the 
performance of the Bayesian tests is high. Further details will be presented elsewhere. 



1 . As we already mentioned in the introduction, the global false alarm probability constraint 
sup fc Pfc(r < A;) = Poo(t < oo) ^ a leads to an unbounded worst-case expected detection 
delay sup fe Efc(r — k\r ^ k) whenever a < 1 due to a high price that should be paid for such a 
strong constraint. Note that to overcome this difficulty in a minimax setting a dynamic sampling 
technique can be used when it is feasible (cf. Assaf et al [11). To the expense of a large amount 
of data that must be sampled, the worst-case average detection delay may then be made bounded, 
yet keeping the global PFA below the given small level. However, dynamic sampling is rarely 
possible in applications. We, therefore, considered a Bayesian problem with the prior distribution. 
The proposed asymptotically Bayesian detection test can be regarded as the Shiryaev detection 
procedure with a threshold that increases in time. The need for the threshold increase is due to 
the strong constraint imposed on the global PFA in place of the average PFA constraint used in 
Shiryaev's classical problem setting. 

2. While the results of the present paper may be used to devise a reasonably simple detection 
procedure to handle the global probability bound on false alarms, the author's personal opinion 
is that this constraint is too strong to be useful in applications. In fact, the conditional ADD 




and the detection procedure ta q is asymptotically optimal. 



6. Concluding Remarks 
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Efc(7A — k\rA ^ k) of the proposed detection procedure grows fairly fast with k, and the "nice" 
property that the Bayesian ADD is as small as possible (for small a) perhaps will not convince 
practitioners in the usefulness of the test. In addition, the mean time to false alarm in this detection 
procedure is unbounded, which is an unavoidable recompense for the very strong global PFA 
constraint. 

3. Taking into account the previous remark, we argue that imposing the bound on the local PFA 
sup fc Poo(k ^ r ^ k + T — 1) or on the local conditional PFA sup fc Poo(k ^ r ^ k + T — l\r ^ k) 
is a much more practical approach. The latter conditional PFA is indeed a proper measure of false 
alarms in a variety of surveillance problems, as was discussed in Tartakovsky [32|. It can be then 
shown that the conventional CUSUM and SRP detection tests are optimal in the minimax sense 
for any time window T, and asymptotically uniformly optimal (i.e., for all k ^ 1) if the size of the 
window T goes to infinity at a certain rate (cf. Lai lH5l 11611 and Tartakovsky [32]). 

4. The sufficient conditions for asymptotic optimality postulated in Theorems [T] and |U are quite 
general and hold in most applications. We verified these conditions for the three examples that 
cover both additive and non-additive changes in non-i.i.d. models. While we are not aware of the 
non-i.i.d. models reasonable for practical applications for which these conditions do not hold, such 
examples may still exist. However, we believe that such situations should be handled on a case by 
case basis. 

5. Similar results can be proved for general continuous-time stochastic models. A proof of the 
lower bound for moments of the detection delay is absolutely identical to the proof of Theorem[T] 
However, derivation of the upper bound is not straightforward and requires certain additional con- 
ditions analogous to those used in Baron and Tartakovsky [2]. 
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Appendix 

Proof of Lemma HJ Define ^l(r) = P k {k ^ r < k + (1 — e)L a }, where L a = q" 1 ] loga|. 
A quite tedious argument analogous to that used in the proof of Lemma 1 of Tartakovsky and 
Veeravalli [36 1 yields 

7?2(r) < e-^^Too {k ^ r < k + (1 - e)L a } 

+ P fc { max Z k k+n > (1 - s 2 )qL a ) . 

Since {k ^ r < k + (1 — e)L a } ^ Poo(t < oo) ^ a for any r £ A 00 (o;), we obtain 

(6.1) 7 «(r) ^a £ " +p k (a,e), 

where 

(3 k (a,e) - PJ max Z k k+n _, ^ (1 - e 2 )qL a \ . 

^l<n<(l-e)L a J 

Let N a = [eL a \ be the greatest integer number ^ eL a . Evidently, 

oo N a 

7?» = j>*7i3(r) < E^7i2(r) + H Na+1 

k=l k=l 
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and, therefore, 

N a 

(6.2) 7e yr) ^ U Na+1 + c/ 2 + J2^Pk(a,e). 

k=i 

The first two terms go to as a — > for any e > 0. The third term goes to zero as a — ► by 
condition (14.21) and Lebesgue's dominated convergence theorem. Since the right side in (16.21) does 
not depend on r, this completes the proof of ( I4.3I ). 

Using the inequality P 00 (r^ < oo) ^ 1/A and applying the same argument as above shows 

that 

N A 

(6.3) jI a (t a ) ^ U Na+1 + 1/A s2 + nPk(A, e), 

where Na = \sLa\ and 

P k (A,e)=pJ max Z k k+n _ x > (1 - s 2 ) log a} . 
Again all three terms on the right-hand side of (16.31) tend to zero as A — > oo, which proves (14.4b . 

□ 
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